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A quantitative evaluation of the influence of sampling on the numerical fractal analysis of ex- 
perimental profiles is of critical importance. Although this aspect has been widely recognized, a 
systematic analysis of the sampling influence is still lacking. Here we present the results of a system- 
atic analysis of synthetic self-affine profiles in order to clarify the consequences of the application of 
a poor sampling (up to 1000 points) typical of Scanning Probe Microscopy for the characterization 
of real interfaces and surfaces. We interprete our results in term of a deviation and a dispersion of 
the measured exponent with respect to the "true" one. Both the deviation and the dispersion have 
always been disregarded in the experimental literature, and this can be very misleading if results 
obtained from poorly sampled images are presented. We provide reasonable arguments to assess 
the universality of these effects and we propose an empirical method to take them into account. We 
show that it is possible to correct the deviation of the measured Hurst exponent from the "true" one 
and give a reasonable estimate of the dispersion error. The last estimate is particularly important 
in the experimental results since it is an intrinsic error that depends only on the number of sam- 
pling points and can easily overwhelm the statistical error. Finally, we test our empirical method 
calculating the Hurst exponent for the well-known 1+1 dimensional directed percolation profiles, 
with a 512-point sampling. 
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I. INTRODUCTION 



The characterization of interfaces and of the mecha- 
nisms underlying their formation and evolution is a sub- 
ject of paramount importance for a broad variety of phe- 
nomena such as crystal growth, rock fracture, biological 
growth, vapor deposition, surface erosion by ion sput- 
tering, cluster assembling, etc ... (0, EI & @ an d 
references therein). Since the pioneering work of B.B. 
Mandelbrot, fractal geometry has been widely used as 
a model to describe these physical systems that are too 
disordered to be studied with other mathematical tools 
but that still hold a sort of "order" in a scale-invariance 
sense |l|, |2|, |6(. In particular, the growth of interfaces 
resulting from the irreversible addition of subunits from 
outside (vapor deposition of thin films, low energy clus- 
ter beam deposition, etc..) shows a typical asymmetric 
scale invariance, because of the existence of a privileged 
direction (e.g. the direction of growth) E Ufllg , EH EE 
HHQillEHEHliJ El HI IHT These in- 
terfaces belong to the class of self-affine fractals and they 
can be described either by the fractal dimension D or by 
the well-known Hurst exponent H p lE^fjllM El l29 | . 
If these systems are the result of a temporally evolving 
process, they usually show also a time scale-invariance 
described by the exponent j3 Q, . Because of the close 
relationship between the scaling exponent (s) and the fun- 
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damental mechanisms leading to scale invariance, univer- 
sality classes can be defined Q, |(| . An accurate knowl- 
edge of H (and (3) is required to identify the universality 
class of the system and to give a deep insight on the 
underlying formation processes. 

The possibility of characterizing the topography of an 
interface in a dimension range from the nanometer up to 
several tens of microns, in a relatively simple and quick 
way by Atomic Force Microscopy (AFM) and Scanning 
Tunneling Microscopy (STM) [20, has stimulated an 
upsurge of experimental report claiming for self-affine 
structures (see Refs. [U E3 and references therein). 
The abundance of experimental characterization of dif- 
ferent systems and the limited sampling capability of the 
scanning probe microscopies (SPM) prompted at the at- 
tention of many authors the need of an accurate method- 
ological approach to the determination of the exponent H 
and of its error 0, E3 , realistically considering the con- 
sequences of the finite sampling inherent to SPM. Typical 
sampling with an AFM or a STM is 256 or 512 points 
per line, for a maximum of 512 lines. Most of the re- 
sults published in the late eighties and early nineties were 
based upon 256x256-point data-sheets, or even smaller 
ones (see list of references in Ref. E^)- Commercially 
available SPMs offer today a maximum of 512x512-point 
resolution, and homemade instruments hardly go beyond 
this value. 

Many authors have questioned the reliability of the 
measurement of the Hurst exponent from a poorly sam- 
pled profile EH [33> EH E§j- I n order to quantify the 
influence of the sampling on the determination of H, 
a numerical analysis can be performed on artificial self- 
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affine profiles, generated with a specific algorithm, with 
a fixed number of points L and known Hurst exponent 
Hin- The "true" exponents (Hi n ) are then compared 
with the ones measured directly from the generated pro- 
files (H out ). Usually a sensible discrepancy between the 
measured H out and the expected iJj„ is found [SftlssLIsi^ . 
The discrepancy is not uniform but depends on the value 
of Hin- As one would expect, the discrepancy is globally 
dependent on the number L and it approaches zero for 
large values of L. In particular, for L < 1000 the sam- 
pling effect is of great importance since the discrepancy 
can be of the order of the exponent itself (100% rela- 
tive error) [3^ |. Dubuc et al. have reported that even 
for values of L as high as 16384, the discrepancy is still 
significant [8^ |. 

Although the problem of sampling has been clearly 
addressed and discussed, quite surprisingly a systematic 
analysis of the problem, considering different generation 
algorithms, is still lacking. The dependence of the sam- 
pling effect on L has been investigated j^E and also 
many different methods for the measurement of H out 
have been considered for different values of Hi n in the 
range [0.1-1] [H |HS HI • However, either only one 
single generation algorithm has been used [sH l33 |. or 
the results from different generation algorithms have not 
been compared 38]. We believe that this comparison is 
of fundamental importance. 

Indeed profiles from different generation algorithms 
can be considered as different self-affine objects sampled 
in L points. For a fixed value of Hi„, these objects would 
all have the same fractal dimension if they were sam- 
pled with an infinite number of points. The fundamental 
question at this point is whether the discrepancy of H out 
from Hi n , for a finite value of L, is the same for every self- 
affine object (i.e. for every generation algorithm). Only 
an analysis that considers different self-affine objects has 
a statistical validity and allows a reliable interpretation 
of the results. Up to now the results obtained in litera- 
ture from a single generation algorithm did non allow a 
discussion of the nature of the aforementioned discrep- 
ancy, which has been interpreted as an uncontrollable er- 
ror affecting the analysis of sampled profiles. The main 
conclusion drawn by these authors is the non-reliability 
of results obtained from profiles with less than 1024 sam- 
pling points [13 . 

Our aim is to achieve a deeper understanding of the ef- 
fects of sampling in order to answer the question whether 
the measurement of the Hurst exponent with a poor num- 
ber of sampling points is reliable or not. This point is 
crucial both for future analysis of self-affine profiles and 
for a correct interpretation of the results already present 
in literature. 

From a more general point of view, fractality is charac- 
terized by the repetition of somehow similar structures at 
all length scales and can be described in its major proper- 
ties by a single number: the fractal dimension D [2t llcj. 
Any finite sampling of a fractal object poses both an up- 
per and a lower cut-off to this scale invariance. It has 



been shown that these cut-offs introduce a deviation in 
D and the sampled object has a dimension different from 
the one of the underlying continuous object [IE HE HE] • 
However, it is still unknown whether the sampling influ- 
ences in a different way different objects characterized 
by the same ideal dimension, thus breaking the sort of 
universality that makes a fractal be identified by its di- 
mension only. 

In this paper we present a systematic analysis consider- 
ing together all the generation algorithms found in liter- 
ature. The aim of our analysis is to understand whether 
the discrepancy of the measured H out for a fixed L and 
for every generation algorithm is completely random or 
has a universal dependence on Hi n . The latter observa- 
tion can be interpreted as a reminiscence of the fact that 
a fractal object is completely characterized by its dimen- 
sion ■ The distinction is of crucial importance because 
in the case of universal dependence of H out on £/j„ , one 
can empirically correct the discrepancy of the measured 
exponents from the "true" ones. Some authors indepen- 
dently suggested to use directly the H out vs. Hi n curves 
as correction, but they considered only one generation al- 
gorithm without discussing the universal character that 
these curves must have in order to be utilized for any 
self-affine object [34j . 

Conversely, on the basis of our analysis, we will inter- 
pret the discrepancy in terms of two distinct contribu- 
tions: a universal deviation and a random dispersion. We 
will propose a powerful method to correct the universal 
deviation and we will discuss the nature of the disper- 
sion, which is due to both statistical fluctuations and an 
intrinsic sampling effect. The latter turns out to be a 
sort of systematic error that cannot be corrected unless 
one knows the generation algorithm that produced the 
self-affine object. In the case of generic self-affine profiles 
which have not been generated by a specific algorithm, 
such as experimental profiles, the above arguments no 
longer hold. A new procedure to quantify the intrin- 
sic error in the measurement of the Hurst exponent of 
generic self-affine profiles is thus needed. 

On these basis, we will discuss the effect of sampling 
on the reliability of the fractal analysis of poorly sampled 
self-affine profiles, focusing on both the deviation and the 
dispersion of the measured exponents from the ideal ones, 
showing that the conclusions drawn by Schmittbuhl et 
al. that "... a system size less than 1024 can hardly be 
studied seriously, unless one has some independent way 
of assessing the self-affine character of the profiles and 
very large statistical sampling" were too restrictive |37j . 
Moreover, we will point out that the estimate of the in- 
trinsic error is essential for a correct classification of a 
process in terms of universality classes. In fact, in order 
to distinguish exponents belonging to different classes, 
it is necessary to quantify the error on the measurement. 
Up to now, the statistical error or the error of the fit have 
been used to quantify the error on the measurement of 
H 0, EE E3 • Both the statistical error and the error of 
the linear fit can be made very small, if a large number 
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of profiles are averaged. However, if the measurement is 
likely to be affected by more subtle intrinsic errors, such 
as the aforementioned dispersion due to the sampling, 
considering only the statistical error may lead to serious 
misleading. The intrinsic error in many cases may indeed 
be much larger than the statistical one. 

In the following sections we will present a systematic 
analysis of synthetic self-affine profiles with the aim of 
both achieving a deep understanding of the effects of 
sampling and providing the experimentalists of a reliable 
tool for the fractal analysis of surfaces and interfaces. To 
this purpose we have developed a new automated fitting 
protocol in order to avoid any arbitrariness in the mea- 
surement. With this new methodology we will study the 
effects of sampling, enlightening the main characteristics 
of the deviation and the dispersion of the measured expo- 
nents. We will present a new powerful method to correct 
the deviation of H out and to estimate the error of the 
measurement. Finally, we will apply our empirical cor- 
rection procedure to 512-point profiles created with the 
directed percolation (DP) algorithm fjij. This system 
provides a simple benchmark to test our protocol and 
allows noticing the opportunity of the correction. 

II. THE AUTOMATED FITTING PROTOCOL 

Self-affine systems occurring in nature are usually pro- 
files or surfaces. In order to measure their Hurst ex- 
ponents the 2+1 dimensional case of surfaces is usually 
reduced to 1+1 dimensions, considering the intersection 
of the surface with a normal plane. The particular case 
of in-plane anisotropy results in a dependence of H on 
the orientation of the plane with respect to the surface 

[1 ilil E]. 

Once we have scaled down the analysis to 1+1 dimen- 
sions, the following general properties characterize a self- 
affine profile. If h(x) is the height of the profile in the 
position x, the orthogonal anisotropy can be expressed 
by the scaling relationship: 

h(Xx) = \ H h{x) (1) 

where H S (0, 1) is the Hurst exponent, A is a positive 
sc aling factor and the equation holds in a statistical sense 
pi |45| . The fractal dimension D of the profile is related 
to the Hurst exponent by the equation D = 2 — H while 
the dimension of the surface is D = 3 — H [2^, E5 . The 
lower is H , the more space invasive is the surface. In 
most of the physical self-affine surfaces, the scale invari- 
ance does not extend to all length scales but there is an 
upper cut-off above which the surface is no longer corre- 
lated. The length at which this cut-off appears is defined 
as the correlation length £ pi [3^ . In the present anal- 
ysis, we consider only profiles whose correlation length 
(expressed in number of points) is equal to their length 
L. To this purpose we have carefully studied each gen- 
eration algorithm in order to grant the condition £ = L. 
For this reason we were often forced to generate very 



long profiles and to consider only their central portion 
|38l I47L l48| . The usual procedure to measure the Hurst 
exponent of a self-affine profile h(x) is to calculate appro- 
priate statistical functions from the whole profile. These 
functions of analysis (AFs) show a typical power law be- 
havior on self-affine profiles: 

AF[h(-),k] =ck f{H) (2) 

where c is a constant, k is a variable indicating the 
resolution at which the profile h is analyzed (typi- 
cally a frequency or a spatial/temporal separation), and 
f(H) is a simple function of the Hurst exponent H 
IE 111 El El E3, EJ E2. The power law behavior of 
the AF is then fitted in a log-log plot in order to cal- 
culate the exponent H. In the analysis of statistical 
sclf-affinc profiles there are random fluctuations super- 
imposed to this power law behavior. The signal-to-noise 
ratio of these fluctuations is scale-dependent, the AFs 
being calculated as averages of statistical quantities at 
different length scales p|. To reduce this noise, the aver- 
age of the AFs obtained from ./V independent profiles is 
usually taken before the execution of the linear fit. How- 
ever, while small-scale fluctuations are easily smoothed, 
larger scale fluctuations converge very slowly. 

The identification of the linear region in the analysis 
of the AFs is a puzzling point. Windowing saturation 
is present at length scales comparable with the profile 
length depending on the nature of the profiles |49| . This 
results in a departure from the power law behavior to a 
constant value. Moreover, the degradation of the frac- 
tality due to the sampling causes a diversion of the AFs 
from their ideal power law behavior. This produces both 
a discrepancy of the measured Hurst exponent from the 
ideal value (a change of the slope in the log-log plot) 
and a shortening of the linear region as shown in Fig. 
H Here, the presence of curved regions is clearly visible. 
It can be seen that this anomalous behavior is not lo- 
calized at length scales close to the length of the profile, 
but involves also the shortest length scales especially for 
values of H close to zero. It is important to notice that 
this effect is not due to experimental conditions, such as 
the finite size of the SPM scanning probe. Thus it is 
necessary, in particular for small values of H, to chose a 
linear region instead of fitting the whole function. The 
methods proposed in the literature to identify the linear 
region (e.g. the consecutive slopes method pl|5l|, corre- 
lation index method [54| , the coefficient of determination 
method [5!| and the "fractal measure" method [5||) are 
usually based on an arbitrary (human) choice. This is 
particularly delicate since the curvature in the AFs can 
be so small, if compared to the statistical noise, that it 
is hard to distinguish the correct linear region. Because 
of this reason, we think that the proposed methods suf- 
fer of a high degree of arbitrariness. Moreover, all these 
methods make no distinction between a straight line with 
statistical noise and a slightly curved line. 

Due to the previous arguments and since no universally 
accepted fitting procedure is available in literature, we 
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FIG. 1: Average height-height correlation function Ci calcu- 
lated from N = 500 profiles of L = 512 points, generated with 
the random addition method with Hurst exponent Hi n — 0.1. 
It is also shown the linear region and the fit obtained with 
the automated fitting protocol (AFP). One can clearly see 
the overall curved shape due to the sampling. 



were prompted to develop an automated fitting protocol 
(AFP) with two purposes: to reduce as much as possible 
the effects of the curved regions on the measured expo- 
nent, and to define a standard algorithm for the choice of 
the linear region, eliminating, as much as possible, any 
arbitrariness. This is very important for the reliability 
of the results, in particular for the comparison of differ- 
ent generation algorithms. Moreover, the automation of 
the fitting procedure is essential to perform a systematic 
analysis. In fact, in order to have good statistics, a large 
number of AFs must be calculated and fitted. 

In our procedure, that is an implementation of the con- 
secutive slopes algorithm pj, the curve to be fitted is 
divided in many portions of the same length £ (in num- 
ber of points) and each of them is considered separately. 
A linear and a cubic fit are performed on each portion. 
Comparing the mean distance of the linear fit from the 
portion to the mean distance of the cubic from the linear 
fit, we evaluate whether the portion is almost linear with 
uncorrelated noise or it presents a definite curvature. Ob- 
viously, the distinction is not immediate and we have to 
set a threshold to separate the two cases through a pa- 
rameter in the fitting procedure. The use of a parameter 
is common to other methods (see for example the coeffi- 
cient of determination method used in Ref. Once 
the fitting parameter is set, our procedure is able to de- 
cide automatically whether the portion is "curved" or 
"linear" . Only the "linear" portions are then considered. 
They undergo a straight-line-fit analysis through which 
the slopes and their errors are determined. A distribu- 
tion of the slopes weighted with the values of the errors 



is then built (see Fig. [20 and its main peak position and 
width are measured. We do not consider here the pres- 
ence of more than one linear region with different slopes. 
Thus, there is a well-defined main peak in the distribu- 
tion. We have extended our procedure also to the case 
of more than one linear region, but this extension is out 
of the scopes of this article. 

The procedure described above is repeated varying the 
length I of the portions from a minimum value £ m i n up 
to the length of the curve. The results are then shown 
in a plot of the peak position (i.e. a slope value) ver- 
sus the length of the portion, with the peak widths as 
error bars (see Fig. |5p). If the analyzed curve presents 
a linear region, this plot shows a plateau for t ranging 
from £ m i n to the length of the whole linear region. This 
plateau is usually very easy to be identified because of the 
distinction between linear and curved portions. In fact, 
portions of length larger than the length of the whole lin- 
ear region are considered curved portions and discarded. 
Thus, the plot usually drops to zero at the end of the 
plateau. Eventually, through an average and a standard 
deviation, we obtain the final slope value and its fitting 
error, while the length of the plateau gives the length 
of the linear region. In conclusion, our AFP is able to 
identify not only the slope of the linear region but also its 
length. We have tested our AFP before its application to 
the systematic analysis and we have found that the mea- 
sured Hurst exponent is widely independent of the fitting 
parameter |64[ ■ Conversely, the length of the linear re- 
gion strongly depends upon the value of the parameter 
and must be considered only an internal parameter of 
the analysis and not a direct measurement of the scale 
invariance range. 



III. NUMERICAL ANALYSIS 

With all the generation algorithms published in liter- 
ature we have created sampled sclf-affinc profiles with 
known fractal dimension D = 2 — H. We have varied 
the exponent H between 0.1 and 1 and we have focused 
on the value L = 512 sampling points (the best sam- 
pling obtainable with most of the SPMs). We discuss 
also different values of L up to 16384. Because there 
exists only a few algorithms that generate exactly self- 
affine profiles, we have used algorithms that generate 
statistically self-affine profiles, which are more difficult 
to handle but closer to reproduce natural physical sys- 
tems. The algorithms we have used are known in litera- 
ture as: the random midpoint displacement |37ll57| . the 
random addition algorithm p4l IHq , the fractional Brow- 
nian motion jE^, the Weierstrass-Mandelbrot function 
HHU, the inverse Fourier transform method |57j and 
a variation of the independent cut method [40|. For the 
measurement of the Hurst exponent of self-affine profiles 
we have used the height-height correlation function Ci 
|49j and the root mean square variable bandwidth with 
fit subtraction method 0; H3 ■ The value of H out has 
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FIG. 2: Application of the fitting protocol step by step: (a) 
the distribution of the slopes for a single value of the length 
t of the portion (I = 0.35 decades) and (b) the final plot of 
the slopes (peak positions) vs. £, with an inset magnification 
showing the error bars. 



been calculated from the slope in the log-log plot of the 
average over N statistically independent AFs, measured 
with our AFP. 

The results are expressed in terms of H out vs. Hj n 
plots. Each plot is characteristic of a single AF and 
generation algorithm and it represents the relationship 
between the measured Hurst exponent H out , calculated 
from the average of N AFs, and the nominal exponent 
Hi n of the profile. Grouping the H out vs. Hi n plots 
obtained using the same AF for all the generation algo- 
rithms, the dispersion of the H out values comes to ev- 
idence. In Fig. |3| we show the H out vs. Hi„ graphs 




FIG. 3: H out vs. Hi n graphs calculated from N = 500 
profiles of L = 512 points each: (a) Height-height correla- 
tion function and (b) Root mean square variable bandwidth 
(with fit subtraction). The black dotted line represents the 
ideal H out = Hi„ behavior. The other line styles are re- 
lated to different generation algorithms: random midpoint 
displacement (black continuous line), inverse Fourier trans- 
form (black dashed line), random addition (black dash-dotted 
line), Weierstrass-Mandelbrot (grey continuous line), frac- 
tional Brownian motion (grey dashed line) and independent 
cut (grey dash-dotted line). 



obtained from N = 500, L — 512 profiles, as explained in 
the previous section. We show separately in Figs. E2a) 
and |3{b) the different AFs used. Since the profiles are 
statistically self-affine, the measured H out are subject to 
a statistical error that is inversely related to N |42|. In 
order to characterize the dependence of this statistical 
error on the number N of averaged AFs, we let TV vary 
from 1 to 50 using the same profiles considered in Fig. [3] 
With these values of N we have repeated the numerical 
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analysis (i.e. calculation of the AFs, averaging and ap- 
plication of the AFP) and we have extracted a standard 
deviation ctn of the measured exponents. In Fig. 01 we 
show the H out vs. Hi n graphs, analogous to those in Fig. 
13 with the calculated error bars (twice the standard de- 
viation <tn), for a few values of N. We present the results 
for a single AF (the root mean square variable bandwidth 
with fit subtraction), the results for the other AFs being 
similar. In Fig. [Sjwe show three H out vs. Hi n graphs 
obtained respectively with N = 500, L = 512 profiles, 
N = 50, L = 4096 profiles and N = 15, L = 16384 pro- 
files. Again, we present only one AF (the height-height 
correlation function Ci). 



IV. RESULTS AND DISCUSSION: DEVIATION 
AND DISPERSION FROM THE IDEAL 
BEHAVIOR 

Ideal continuous fractal profiles are statistically char- 
acterized by their fractal dimension (universality) and 
their H out vs. H in graphs are straight lines 

In Fig. |21 a deviation from the ideal behavior is ob- 
served for both the AFs. It turns out that the sampling 
of a profile affects in a different way different methods of 
analysis. The deviation from the ideal behavior has been 
already observed in literature (for example, see Ref. |37jl 
and our results are in good agreement with the previous 
ones. 

Moreover, within the same method of analysis we ob- 
serve that the different generation algorithms give sig- 
nificantly different H out vs. Hi n plots. This dispersion 
is pointed out here for the first time because different 
generation algorithms are considered together. The sig- 
nificance of the dispersion can be inferred from the char- 
acterization of the statistical error of the measured expo- 
nent discussed hereafter. 

In Fig. 2| we show that for N > 25 and H out < 0.3 
the error bars of H out for different generation algorithms 
hardly overlap. This fact suggests that the statistical er- 
ror is not the only reason of the differences between the 
H out vs. H in plots shown in Fig. 3. In Fig. HJJwe plot the 
statistical error <jn times the square root of N vs. N . For 
N > 10 the curves approach a constant value according 
to the relationship between the standard deviation of in- 
dependent, normally distributed measurements and the 
standard deviation of the mean upon N measurements: 
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N 



(3) 



This result shows that the AFP and the averaging of the 
AFs do commute. The assessment of this property is non- 
trivial due to the complexity of the AFP. Thus, we ex- 
trapolate the statistical error of the measured exponents 
in Fig. |21 {N — 500) using Eq. © where a is extracted 
from the plateau in Fig. Overestimating a with the 
value 0.16 we obtain 0500 = 0.007. This value produces 
an error bar in Fig. [3|as small as the symbol used to mark 
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FIG. 4: H out vs. Hi n graphs with error bars equal to twice 
the standard deviation <tn of the measured exponents. These 
graphs correspond to different values of the number JV of sta- 
tistically independent profiles from which an average Hurst 
exponent is measured: (a) N = 1, (b) iV = 10 and (c) N = 50. 
It can be seen that for iV > 10 and for Hi„ < 0.3 the over- 
lap between the error bars corresponding to different gener- 
ation algorithms is small or completely absent. For the sake 
of clarity we do not distinguish between different generation 
algorithms 
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FIG. 5: H out vs. Hi n graphs calculated with the height-height 
correlation function from: (a) TV = 500, L = 512 profiles, (b) 
TV = 50, L = 4096 profiles, (c) TV = 15, L = 16384 profiles. 
Line styles are the same as in Fig. |3] 



FIG. 6: Graph of the statistical standard deviation a of the 
Hurst exponent, obtained from the definition of the standard 
deviation of the mean (Eq. ®), vs. the number TV of aver- 
aged AFs. It can be clearly seen the saturation for values of 
TV bigger than 25 for almost all the generation algorithms. 



the data. A direct calculation of 0500 , obtained averaging 
AFs calculated on groups of TV = 500 profiles for every 
H € [0.1, 1] and for every generation algorithm, fitting 
and extracting a mean value and a standard deviation 
of H, would have required a huge and time consuming 
calculation. 

These results suggest that the observed dispersion be- 
tween the H out vs. H in curves for different generation 
algorithms is an intrinsic effect of the sampling, depend- 
ing only on the number of sampling points L. This fact 
has an important consequence on a fractal analysis of ex- 
perimental surfaces. While looking at a real sample, we 
do not know what kind of "algorithm" has generated the 
surface. This introduces an uncertainty on its real frac- 
tal dimension independent of the statistical error. Thus, 
there is an intrinsic upper limit to the precision of the 
measurement of the exponent. It is useless to strengthen 
the statistics once the number of acquired profiles makes 
the statistical error smaller than the intrinsic dispersion. 

In Fig. [5]we see that as L increases both the deviation 
and the dispersion decrease in agreement with their ex- 
pected vanishing in the limit of L going to infinity |37| . 
This is also an a posteriori proof of the correctness of 
both the generation algorithms and the methods of anal- 
ysis. 

Our interpretation of these effects is that the sampling 
of a self-affine profile lessens its fractality in such a way 
that it is no longer characterized universally by its frac- 
tal dimension (or Hurst exponent). While for a continu- 
ous self-affine profile the relationship H out = Hi n holds, 
for sampled profiles we can see that different AFs pro- 
duce different H out vs. Hi„ plots from the same sampled 
fractal profile. Considering instead a single AF, our re- 
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suits show that sampled fractal profiles generated with 
different generation algorithms but with the same ideal 
dimension give different measured Hurst exponents. 

However, Fig. [3] clearly shows that the lessening of 
fractality of a profile is rather a continuous process than 
a sharp transition: the poorer is the sampling, the worse 
are the deviation and the dispersion. In Figs. 5 and 3 
we observe that the lessening of fractality acts in a sim- 
ilar way on profiles generated with different algorithms. 
The common trend of the H ou t vs. Hi n curves obtained 
from different generation algorithms is interpreted as a 
consequence of the universality of fractal objects. 

It is then reasonable to assume the existence for every 
AF of a universal region in the H out -Hi n plane containing 
all the H out vs. H in plots obtained with every possible 
generation algorithm. This region, approximately iden- 
tifiable with the envelope of the H out vs. H in plots, has 
a width that depends on the number of sampling points 
and approaches the 1-dimensional H out — Hi n ideal curve 
for very large values of L. We expect that, given any con- 
tinuous self-affine profile with a Hurst exponent Hi n and 
given the exponent H out measured from an L-point sam- 
pling of the continuous profile, the pair (Hi n ,H out ) be- 
longs to the universal region of the corresponding graph 
(specific for every AF and number of sampling points L). 
Provided a good characterization of the aforementioned 
regions (i.e. using as many generation algorithms as pos- 
sible) , we can use them to generate calibration graphs for 
every L and AF describing the relationship between the 
measured H out and the true value Hi n . 

To produce the calibration graphs we proceed as fol- 
lows. First of all, we make two general assumptions in 
order to take quantitatively into account the problem of 
measuring the Hurst exponent of a sampled profile. We 
assume that the H out values corresponding to the same 
Hi n are normally distributed around a mean (H out ), and 
we assume also that the values obtained with the avail- 
able generation algorithms are a random sampling of the 
gaussian distribution. We then measure the average and 
the standard deviation of the dispersed H out values cor- 
responding to each Hi n separately. Thus we obtain a 
sampling of the functions describing the dependence of 
(H ou t) and <JH out from Hi n . With an interpolation algo- 
rithm using smooth functions, we derive the curve repre- 
senting the relationship between (H out ) and H m - We also 
derive the pair of curves corresponding to (H out ) +naH out 
and (H out ) — nau out vs. H out which define the n-th confi- 
dence level. For every value of H out it is possible to find 
the confidence interval of Hi n for any given confidence 
level. The resulting graphs for L — 512 are shown in Fig. 
[7| These calibration graphs allow to take into account 
the deviation and the dispersion due to the sampling. A 
similar method has been independently proposed in Ref. 
[34| even though the analysis was limited to a single gen- 
eration algorithm and the discussion on the reliability of 
the calibration regions together with the intrinsic disper- 
sion were completely neglected. 

Using the calibration graphs it is possible to measure 
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FIG. 7: Calibration graphs H out vs. Hi n for the methods 
of analysis used in this article: (a) height-height correla- 
tion function and (b) variable bandwidth with fit subtraction. 
From the value of the measured exponent, one can easily ex- 
tract the corresponding confidence interval of the corrected 
exponent, as represented graphically in (a). 



the Hurst exponent of poorly sampled profiles correcting 
for the first time the deviation due to the sampling and 
providing a reasonable estimate of the error on a confi- 
dence level basis. The quantification of the error is of 
paramount importance, as pointed out in the introduc- 
tion, since many authors estimated the error from the 
precision of the linear fit 0, ^3 or from the standard 
deviation of the measured exponents [42l |. Our results 
show that they usually underestimated the true error. 
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V. APPLICATION OF THE CALIBRATION 
GRAPHS TO THE STUDY OF DIRECTED 
PERCOLATION NUMERICAL PROFILES 

We have applied our procedure to the 1+1 dimen- 
sional directed percolation (DP) model, described by S.V. 
Buldyrev et al. |44| . This model mimics the paper wet- 
ting process by a fluid. The resulting pinned interface is 
self-affine with exponent H ~ 0.63. 

We have analyzed N = 30, L = 16384 DP profiles with 
the height- height correlation function (h.-h. corr) and 
the variable bandwidth with fit subtraction (vbw), using 
the automated fitting protocol to measure the Hurst ex- 
ponents. The results are shown in the second column of 
Tab. [I] We have not calculated the statistical error (see 
Section Hvj) because it would have been excessively time 
consuming. Thus, the error shown is simply the error of 
the fit calculated with the AFP. The values of the mea- 
sured exponents if^f 84 are significantly lower than the 
ones predicted by the DP model, suggesting that a cor- 
rection is needed even in the case of profiles of L = 16384 
points, which are widely considered as continuous. 

We have then analyzed N — 1000, L = 512 profiles 
extracted from the L = 16384 profiles. We have applied 
the correction procedure based on the calibration graphs 
shown in Fig. [7\to the exponents measured with the AFP. 
In the third column of Tab. [J the uncorrected measured 
exponents (H^f) are shown. The error is calculated as 
the root mean square (rms) value of the statistical error 
ciooo (evaluated as explained in Sect ion llV|) and the error 
of the fit calculated with the AFP. In the fourth column, 
the confidence intervals corresponding to the 68% proba- 
bility for the "true" exponents are shown (Hf^ 2 (68%)). 

The results summarized in Tab. [I] allow to notice the 
effectiveness of the calibration graphs in the analysis of 
self-affine profiles when the effects of sampling are non 
negligible. In the example reported here, the poor sam- 
pling causes a discrepancy of about 4% between the mea- 
sured exponents and the theoretical one for DP profiles. 
After the correction with the calibration graphs, the ex- 
pected value Hi n ~ 0.63 is consistent with the confidence 
intervals of the three AFs. Moreover, the intrinsic er- 
ror due to the dispersion (about half the width of the 
confidence interval) turns out to be usually one order of 
magnitude larger than the aforementioned rms error. 

In conclusion, our calibration graphs have allowed to 
correct the deviation and to quantify the intrinsic error 
of the Hurst exponent of poorly sampled (L = 512) DP 
profiles. 



VI. CONCLUSIONS 

We have carried out a systematic analysis in order to 
achieve a deeper understanding of the effects of sampling 
on the measurement of the Hurst exponent of self-affine 
profiles. This is a crucial point for the assessment of 
the reliability of fractal analysis of experimental profiles, 



such as topographic profiles of growing thin films and 
interfaces acquired with a Scanning Probe Microscope. 
We have pointed out that some of the steps leading to 
the measurement of the Hurst exponent have been only 
superficially discussed, although worth of deeper atten- 
tion. We have focused on the quantification of the effects 
of sampling and possibly on their correction, allowing 
a more reliable identification of the universality class of 
growth. 

In order to perform such a quantitative analysis we 
have developed a new automated fitting protocol that 
allows to remove the ambiguity in the choice of the region 
for the linear fit of the analysis functions. This point 
is usually underestimated in the published experimental 
literature, and appears to be a significant source of error 
in the whole analysis. Moreover, an automated protocol 
sensibly reduces the time required for the fitting of a large 
number of noisy curves, allowing a higher statistics. With 
our automated fitting protocol we have systematically 
investigated synthetic self-affine profiles generated with 
all the generation algorithms found in literature using 
different method of analysis. 

The systematic analysis presented in this paper has 
been carried out on 1+1 dimensional profiles and we 
have not considered 2-dimensional methods of analysis 
(e.g. see [13 El)- However, it is reasonable to suppose 
that even in this case the effects of sampling cannot be 
neglected, and the conclusions drawn in Ref. [34[ are 
probably incorrect. The similarity between Fig. 1 in 
Ref. [34( and the analogous results presented in this pa- 
per (see the variable bandwidth analysis of profiles gen- 
erated with the random midpoint displacement shown in 
Fig. Et) suggests that conclusions very close to those 
presented here can be drawn also in the 2-dimensional 
case. 

Studying the discrepancy between the measured Hurst 
exponent H out and the "true" one (flf n ) for synthetic 
self-affine profiles with L = 512 points, we have shown 
that the main effects of sampling are a deviation of the 
H out vs. Hi n plots from the ideal behavior and a dis- 
persion of the exponents calculated from different gen- 
eration algorithms. Both these effects smoothly reduce 
with increasing values of L. The deviation turns out to 
be universal in the sense that the trend of the H out vs. 
Hi n curves is common to all of the generation algorithms, 
depending only on the number of sampling points and on 
the function used in the analysis. We propose that this 
behavior is reminiscent of the fact that a fractal object is 
completely characterized by its dimension and therefore 
the deviation can be at least empirically corrected. The 
dispersion instead has to be considered as an intrinsic er- 
ror due to the sampling, but for the very special case of 
profiles whose generation algorithm allows to build their 
specific H out vs. Hi n plot. This dispersion error must 
be quantitatively taken into account since it cannot be 
reduced with an increase in the statistics but only with 
an increase in the number of sampling points. 

The existence of an intrinsic dispersion error in the 
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TABLE I: Measured Hurst exponents of sampled DP profiles (theoretical value: H ~ 0.63 |4^|~) 





rrl6384 a 


rr512 b 
" out 


at 2 (68%) 


h.-h. corr. 


0.615 ±0.004 


0.609 ± 0.002 


[0.613 - 0.635] 


vbw 


0.620 ± 0.003 


0.608 ±0.012 


[0.611 - 0.644] 



"The error for L = 16384 is the error of the fit. 
'The error for L = 512 is the rms value of the statistical error 
and the error of the fit 



measurement of the Hurst exponent that depends only 
on the number of sampling points is very important. In 
fact, this intrinsic error easily overwhelms the statistical 
error for poorly sampled profiles. It is definitely clear 
that a reliable result cannot be based on the considera- 
tion of the statistical error only. Moreover, the dispersion 
poses an upper limit to the precision in the measurement 
of the Hurst exponent of sampled profiles. It becomes 
useless to increase the statistics once the statistical error 
has been made reasonably smaller than the intrinsic one. 
This is particularly important in an experimental anal- 
ysis because it usually reduces significantly the number 
of profiles that have to be acquired, making the analysis 
much less time consuming. 

Thanks to our systematic analysis, we have built, for 
each method of analysis, a calibration graph represent- 
ing the region of the H out -Hi„ plane where the true ex- 
ponents fall within a given confidence level. We have 
originally proposed to use these graphs as a reliable em- 
pirical method to correct the measured value of the Hurst 
exponent of a poorly sampled profile and to estimate its 
intrinsic sampling error. The reliability of the calibration 
graphs is based on two assumptions: 

i) The measured exponents for all the possible self- 
affinc profiles, with the same "true" exponent Hi n 
and with the same number of sampling points, are 
normally distributed; 

ii) The numerical generation algorithms known in lit- 
erature provide a statistically reliable sample of all 
the possible self-affine profiles. 

Even though we have found just six generation algo- 
rithms in literature, we believe that they still allow to 
obtain reasonable results or at least the only ones ob- 
tainable to date. These results represent a step forward 
to a reliable fractal analysis of both numerical and exper- 
imental profiles and to the individuation of the universal- 
ity classes in the study of the evolution of many different 
systems. 

In conclusion, we have demonstrated that a reliable 
measurement of the Hurst exponent of poorly sampled 
sclf-affinc profiles is possible, provided that the measured 
H ou t is corrected of its deviation and that the sampling 
error is quantitatively taken into account. We have thus 
given strength to experimental analyses, since the nu- 



merical results reported in literature to date led to the 
conclusion that the analysis of self-affine profiles sampled 
with less than 1000 points is not reliable [37|]. Even with 
the great improvement introduced by the use of the cal- 
ibration graphs in the analysis of self-affine profiles, we 
definitely agree with Schmittbuhl et al. in pointing out 
that the comparison of the results obtained with differ- 
ent method of analysis is of fundamental importance |37| . 
Furthermore, we shortly comment on the common ex- 
perimental procedure of connecting AFs calculated from 
profiles acquired with different scan sizes |4lL l43l |6l|. 
This connection allows investigating a wider range of 
length scales with a limited number of sampling points 
and makes the measurement more reliable. However, the 
deviation and dispersion are not influenced by this proce- 
dure, since they depend only on the number of sampling 
points of the profiles on which the AFs are calculated. 

The AFP and the calibration graphs have been tested 
on numerically generated 1+1 dimensional directed per- 
colation (DP) profiles, which have provided a benchmark 
to check our protocol. We have shown that for L = 512 
profiles a correction is needed and the calibration graphs 
allow to recover the theoretical value of H predicted by 
the DP model. We have also shown that a correction is 
needed even for the L — 16384 profiles, which are widely 
considered as continuous. 

Our results provide a powerful tool for the accurate 
extraction of the Hurst exponent from poorly sampled 
profiles, and for the quantification of the error in the 
measurement. This is of paramount importance for ex- 
perimentalists who study the scale invariance of surfaces 
and interfaces by Scanning Probe Microscopy or other 
techniques, with the aim of identifying the underlying 
universality classes. The huge amount of experimental 
results published in the past two decades about the frac- 
tality of many interfaces can be now analyzed under a 
new light. 
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